Enhanced architecture and implementation of spectrum shaping codes

Spectral shaping codes are modulation codes widely used in communication and data storage systems. This research enhances the algorithms employed in constructing spectral shaping codes for hardware implementation. We present a parallel scrambling calculation with a time complexity of O(1). Second, in the minimum accumulated signal power (MASP) module, the sine-cosine accumulation needs to be determined by remainder with time complexity O(n2). We offer reduced MASP computations for short bit-width data, ROM storage, and addition pipelines. It can remove the remainder operation, reducing accumulated complexity to O(1). In addition, we present a search algorithm to generate segmented lines to replace the square operations in the MASP module. By employing the search algorithm and shift operations, we can reduce the complexity of the square from O(n2) to O(1). The implementation results reveal that the original and proposed MASPs yield nearly identical spectrum nulls. The encoder-decoder of the spectral shaping codes with proposed approaches consumes just 6% of the hardware resources when carried out with a Spartan6 XC6SLX25.


INTRODUCTION
Spectrum shaping codes are categorized as modulation codes, and they are applied initially to digital communications utilizing transformers to connect two lines.Transformers cannot transfer signals without significant distortion if the power spectral densities of signals include low frequency components.The shaping codes are designed to adjust source data to satisfy the features of the communication channel.These codes are also employed for digital recording systems to translate an arbitrary data sequence into a sequence with particular characteristics required by the systems (Immink & Cai, 2021).More recently, a novel concept of integrated microwave photonics spectral shaping is introduced to open avenues to advanced functionalities (Daulay et al., 2021).
Spectrum shaping technologies are utilized in a variety of fields.(1) In information processing and transmission fields, Chai et al. (2014) discuss the practical obstacles to implementing dynamic spectrum access (DSA) devices and offer solutions.In order to accommodate DSA in commercial off-the-shelf wireless devices, they also propose a general per-frame spectrum-shaping protocol.A simple spectrum shaping technique based on switching three loads has been presented for backscatter modulation-based Internet of Things (IoT) systems (Nagaraj, 2017).Danila (2021) describes theoretical research conducted in the terahertz G-band for a piezoelectrically-responsive ring-cone element metasurface composed of polyvinylidene fluoride (PVDF)/silicon and PVDF/silica glass.Utilizing the longitudinal piezoelectric effect of PVDF, this study examines the spectrum shaping ability of a polymer-based metasurface.Three distinct filter functions, such as Fano-like resonances, wavelength interleaving, and variable resonance mode splitting, are accomplished in Arianfard et al. (2021).The outcomes theoretically validate the proposed device as a compact photonic filter with many functions for adjustable spectral shaping.Dobre et al. (2021) developed spectrum-skirt-filled pulse-shaping filters corresponding to spectral mask response.The suggested system design achieves more excellent data rates in a dispersive microwave propagation environment than conventional transmission using Nyquist pulse shaping.Nasarre et al. (2021) present a novel concept of frequency-domain spectral shaping (FDSS) with spectral extension for the enhancement of the uplink (UL) coverage in 5G New Radio (NR), based on discrete Fourier transform spread orthogonal frequency-domain multiplexing.The results demonstrate that the spectrally-extended FDSS method is a highly effective solution for improving the 5G NR UL coverage.Furthermore, we can create dependable systems by integrating modern modulation techniques and rate-diverse error-correcting codes (Fang et al., 2023;Chen et al., 2023;Lin et al., 2023).(2) In information storage fields, spectrum shaping codes have spectrum nulls at specific frequencies (Pelusi et al., 2015).In addition, it is expected to enhance the performance of dedicated servo recording systems by using the shaping codes (Ng et al., 2015;Yuan et al., 2015), which is a promising technology for ultra-mobile hard disk drives.Shaping codes with spectrum nulls at non-zero frequencies effectively reduces interference between data signals and narrow band signals.In a dedicated servo recording system, there are two frequencies for servo signals, i.e., a frequency of f 1 on even tracks and a frequency of f 2 on odd tracks.In addition to avoiding interference between data and servo signals, it also permits filtering of low-frequency disc noise.Moreover, the applied recording systems require a run-length limit constraint (Tandon, Motani & Varshney, 2019), also known as the k-constraint.Kahlman & Immink (1995) concern the spectral shaping of both embedded pilot tones and spectral nulls in digital magnetic video tape recordings.The spectral notches are essential to prevent interference between the written data and the servo detecting mechanism.(3) In medical fields, Greffier et al. (2020) investigate the influence of tin filter-based spectral shaping computed tomography (CT) on image quality and radiation dose for use in ultralow-dose CT protocols.Tin filtering enhances the quality of the X-ray beam and the image quality characteristics of phantom images.Baldi et al. (2020) suggest a spectral shaping and third-generation dual-source multidetector CT scanner for evaluating osteolytic lesions caused by multiple myeloma.The outcome validates the benefits of whole-body low dose computed tomography for diagnosing patients with multiple myeloma.Agostini et al. (2021) investigate the function of third-generation iterative reconstruction (ADMIRE3) in a dual-source, high-pitch chest CT protocol with spectral shaping at 100 kVp coronavirus disease 2019 .The low-dose CT with spectral shaping and ADMIRE3 provides acceptable image quality for evaluating COVID-19 patients while significantly reducing radiation dose and motion anomalies.Hardening the X-ray beam, tin prefiltration is established for imaging highcontrast subjects in energy-integrating detector computed tomography (EID-CT) (Grunz et al., 2022).This study aims to examine the potential dose-saving effect of spectral shaping via tin prefiltration in photon-counting detector CT (PCD-CT) of the temporal bone.Seeking for matched image noise, high-voltage scan methods with tin prefiltration enables more significant dose savings in EID-CT.However, superior inherent denoising reduces the dose reduction potential of spectral shaping in PCD-CT.
Based on the excellent performance of the research in Cai et al. (2017), this study presents the implementation of spectrum shaping codes deploying a field programmable gate array (FPGA).The shaping code architecture consists of scrambling and descrambling, k-constrained encoding and decoding (Immink, 2012), and a minimum accumulated signal power (MASP) module.We provide simplified approaches for these modules, which are suitable for hardware implementation.Scrambling is a highly effective technique (Park & Son, 2020;Xiao et al., 2020;Liu et al., 2021).In the proposed scrambling and descrambling, we use only XOR (exclusive or) logical operations and no other arithmetic operations.The algorithm for k-constrained encoding and decoding is then described.Furthermore, we propose improved calculations to reduce parameter storage and processing complexity in the MASP module.
The study is organized as follows.In 'Shaping Code Algorithms', we describe the overall architecture of the FPGA system implementation and present the algorithms of spectrum shaping codes.'The enhanced algorithms' enhances the algorithms.'FPGA implementation of a spectrum shaping code' demonstrates a specific hardware implementation of the shaping algorithms with reduced computations.The shaping code is synthesized, and the consumed resources are analyzed.'Discussion and conclusion' gives the conclusion and discussion.

SHAPING CODE ALGORITHMS
Figure 1 illustrates a block diagram of an encoder and a decoder for spectrum shaping codes.In the encoder, the first step is to generate 0 to 2 p − 1 numbers in decimal form and convert them to binary vector A with the size of 2 p × p, and p denotes the length of a scrambler.Then, we append A to the user data of length m bits, generating a vector B of size Second, B is fed into a guided scrambler module and then is scrambled.Then, we can generate a vector C of size 2 p × (m + p).Third, the scrambled vector C is encoded using a k-constrained encoder, yielding a vector D of size 2 p × (m + p + 1).Finally, the accumulated signal power is calculated from D(0) to D(2 p −1), and the one with the least power vector T is chosen and sent.In the decoding process, the received signal R with a bit-width (m + p + 1) is fed into the k-constrained decoder, which produces the data Y with a bit-width (m + p).By descrambling Y , we can obtain the data Z with a bit-width (m + p).The original user data can be recovered by eliminating the redundant p bits.

Simplified scrambling and unscrambling algorithms
In this study, the guided scrambler (GS) polynomial is Where g 0 is a constant bit of value 1, and g i is binary bit of 1 or 0, 0 < i < p.The As b i , c i , and g j are binary values of 1 or 0, 0 ≤ i ≤ n − 2, 0 ≤ j < p, and then the multiplication result of g i−k−1 c i−k−1 is also binary.Moreover, logical XOR operation can replace the addition involved in Eq. ( 2).The XOR compares two bits and returns a bit 1 if the two bits are different, 0 if they are equal (Qiu et al., 2024).An XOR operation between a variable and 0 returns the variable itself.Let the operation ⊕ indicate bitwise XOR.Therefore, we can modify Eq. ( 2) as (3) Since Eq. ( 3) has a recursive structure, we perform a serial implementation, which takes n clock cycles to complete.Thus, the time complexity of GS encoding is O(n).
Next, we show the GS decoding process for restoring the original bit set The GS encoding and decoding involve only XOR operations.In order to get b i from c i , we use Equation ( 2) or (3) and add b i ⊕ c i on the both hands side of Eqs. ( 2) or (3).Hence, we get the desired last Eq.( 4) generating the values of b i .
(4) By comparing Eqs. ( 2) with ( 4), we observe that we only need to change positions between c i and b i .Since the c i is the encoded bit and can be known, the GS decoding can be implemented in parallel.

The algorithm of minimum accumulated signal power
As given in Cai et al. (2017), the MASP criterion is w i e −j2πf s i 2 . (5) Where j = √ −1, t is the number of spectrum nulls at frequencies f 1 ,f 2 ,..., f t and n is the length of one codeword, l indicates the number of codewords that need to be computed.The w and w * express the current unencoded codeword and previous encoded codeword, respectively. Let where Re l−1 s and Im l−1 s are the sine-cosine accumulations of the l − 1 codewords and have been calculated.In Eq. ( 5), the left part is already obtained and can be directly added to the right part.By application of Euler's formula, the unencoded codeword is computed by

THE ENHANCED ALGORITHMS Parallel scrambling algorithms
The GS scrambler polynomial is employed as 1 + x 2 , where p, g 0 , and g 1 are equivalent to the digits 2, 1, and 0, respectively.Based on the scrambling Eq. ( 2), the encoding is given by where c 0 and c 1 are initially set to zero.We can see that this calculation is recursively executed in serial.In other words, one clock is taken to produce one c i .It takes n clocks to calculate all c i .Thus, the time complexity of the initial scrambling polynomial is O(n).To reduce this time consumption, we transform Eq. ( 8) as In this way, the calculation Eq. ( 9) is not recursive since the right side of Eq. ( 9), i.e., input data b and c 0 and c 1 , are known in advance.Then, we can independently compute c 3 , c 4 , . . ., c n at one clock in parallel, with a time complexity of O(1).It means that the time complexity is reduced from O(n) in Eq. (8) to O(1).
The corresponding decoding is given by Recall that the encoding and decoding of scrambling only use XOR operations.

Improved MASP with remainder operation
Solving for sine and cosine is a critical step in Eq. ( 7), and we propose a minimum accumulated signal power with remainder (MASP-R), which is stated as follows.
Step 1: Convert the radian value 2π f s i to the degree value h 1 , (l − 1)n + 1 ≤ i ≤ ln.
Step 2: The h 1 may be greater than 360 degrees, so we need to perform the remainder of the operation on h 1 .Furthermore, since sin(h 1 ) = cos(h 1 + 270 • ), we also need to calculate the remainder of h 1 + 270 • to 360 • .Thus, the sine and cosine of 2π f s i can be given by Step 3: We construct a ROM and store the cosine values from 0 degrees to 359 degrees in the ROM.Determine the cosine values of h r1 and h r2 from the ROM.
Thus, given the cosine value in the first quadrant, we can determine the values in the other three quadrants.The ROM only needs to store 91 numbers from 0 to 90 degrees instead of 360 values, thus saving 3/4 of the storage space.When MASP solves for sine and cosine, it solves for cos(h r1 ) and cos(h r2 ).Next, we show an example of a modified cosine solution using h r1 .

cos(hr
Improved MASP with no remainder and square

Remove the remainder operation
Next, we propose an improved MASP algorithm with no remainder and square (MASP-NRS).According to the MASP formula Eq. ( 7), we compute a 360-degree remainder, obtain the related sine-cosine value, and perform an addition.A sine-cosine accumulation requires n clocks.The parallel execution for the accumulation is complex, and serial operation is employed instead.The time complexity of the remainder operation is O(n), while a sine-cosine accumulation requires n remainders.It leads to the time complexity of accumulation O(n 2 ).To reduce this time complexity, we propose an algorithm to remove the remainder operation that includes the following methods.Improvement 1: Reduce the number of codewords involved in accumulation.The nl i=(l−1)n+1 w i cos(2πf s i) and nl i=(l−1)n+1 w i sin(2πf s i) of the current l-th codeword need to be added to the nl i=1 w * i cos(2πf s i) and nl i=(l−1)n+1 w * i sin(2πf s i) of the previous (l − 1) codewords.We have w i sin(2πf s i). ( The values of Re l s and Im l s increase as the number of codewords increases.After accumulating 64 codewords, we reset Re l s and Im l s to 0 to limit these values.Improvement 2: Eliminate the remainder of the operation and use ROM storage instead.Let the shaping code utilize two dual servo frequencies, f 1 and f 2 .A codeword w has n bits that is multiplied by four groups of sine-cosine cos(2πf 1 i), sin(2πf 1 i), cos(2πf 2 i), sin(2π f 2 i), and 0 ≤ i ≤ n − 1.Each sine-cosine group contains n data points.A total of 64×4×n sine-cosine values are stored.
Improvement 3: Adopt small bit-width.The initial sine-cosine values need to be transformed from decimals to integers to calculate on the FPGA.Multiply the initial sine-cosine values by 15 and round to the nearest integer number, which is approximately equivalent to moving the values left by four bits.As a result, including the sign bit, the bit width of four groups of sine-cosine values is 5, with a maximum value of 15.
Improvement 4: The parallel operation.Each item in codeword w has a value of either -1 or 1.The select operations can multiply w by sine and cosine.We can acquire n sine-cosine values from ROM at the same time and perform parallel selection operations to complete the sine-cosine accumulation in a single clock cycle.Thus, we eliminate the remainder operation, reducing the accumulation time complexity from O(n 2 ) to O(1).
Remove the square operation Equation ( 7) involves a square operation, which has a calculating cost of O(n 2 ) and is challenging to compute.We provide a segmented line search algorithm with dynamic error.The search algorithm seeks segmented points, which are combined to produce a segmented curve.Applying the curve, we obtain an approximate estimation.The operation of this curve only involves deterministic shifts and additions/subtractions with a complexity of O(1).The complexity of the proposed search algorithm is two orders of magnitude lower than that of the square operation.The key features of the algorithm are the usage of dynamical error and the balanced coefficient of mean square error.The search algorithm is described in Algorithm 1.
We generally use expression Eq. ( 14) to calculate the mean square error.
where f (x i ) represents the predicted value and f (x i ) the actual value.
The large, varied item significantly influences the error expression Eq. ( 14), but the little various item has a minor impact.So, in Algorithm 1, we propose a balanced-coefficient Algorithm 1 Segmented line search algorithm with dynamical error Require: f (x): the square function; x: the independent variable; Ensure: a set of segmented points; end while 10: x b = x v ; 11: x v = x v+2 ; 12: 13: else 14: x b = x b ; 15: x v = x j ; 16: end if 17: end for 18: return sp = [sp 0 , sp 1 , sp 2 , ...]; mean square error expression Eq. ( 15) to accurately describe the importance of each item.
where α and β are called fast and slow decay factors, respectively.Multiple segmented line Eq.( 16) can be generated when segmented points are provided.The product of x and k 0 ,k 1 ,..., can be replaced by shift operation on x.The complexity of calculating x2 is O(n2), whereas applying Eq. ( 16) and combining with the shift operation to compute the square of x decreases the complexity to O(1).As a result, we can rewrite Eq. ( 7) as Eq. ( 17).The values of the cosine and sine functions range from −1 to 1 in Eq. ( 17).It can obtain large values of Re l−1 s + nl i=(l−1)n+1 w i cos(2πf s i) and Im l−1 s + nl i=(l−1)n+1 w i sin(2π f s i) 2 , when the length n of a codeword is large and the binary bits w i are all positive 1.Note that we use the MASP algorithm only for comparison.Thus, we can simultaneously reduce the sum of the two trigonometric functions without affecting the comparison.Then, we can modify Eq. ( 17) as Where num expresses an integer.

FPGA IMPLEMENTATION OF A SPECTRUM SHAPING CODE
Here, we employ a specific shaping code as an example of FPGA implementation.Let the lengths of the shaping code and the original message be 80 and 77 bits.Then we get L = 16 and q = 80/L =5.The GS scrambler polynomial is 1 + x 2 , and the length of the scrambler is 2. Two bits are chosen from the binary set 00, 01, 10, 11 and appended to the original message.Next, the 79 bits need to be scrambled utilizing the parallel scrambling algorithms described in 'Parallel scrambling algorithms'.After that, we add a bit 1 to the scrambled 79 bits to create 80 bits.The k-constrained and MASP algorithms are then executed.

The implementation of MASP-NRS
The implementation of removing reminder operation Let the shaping code utilize two kinds of dual servo frequencies, f 1 = 1/90 and f 2 = 1/60.Each sine-cosine group contains 80 data points.We store 64 × 4× 80 groups of sine-cosine values, requiring a total of 64 × 4 × 80 × 5 = 12.5 KB.
Next, using a pipelined operation, we implement the sine-cosine accumulation in Eq. ( 18). Figure 2 illustrates the pipeline structure.
Step 1: In Fig. 2, we use r_f k cos_w i and r_f k sin_w i to denote the product of w i with cos(2πf k i) and sin(2πf k i), k = 1,2, and 0 ≤ i ≤ 79.According to the value of w i , we use selectors to determine the 80 values of r_f 1 cos_w i , 0 ≤ i ≤ 79.
Step 2: Accumulate r_f 1 cos_w i .If 80 data points are added two by two, the four-stage accumulation will need 40, 20, 10, and 5 addition operations, respectively.Thus, five operands are remaining after the four-stage addition.However, adding these five operands two by two is inconvenient.We construct a segmented accumulation equation because the r_f 1 cos_w i has a small five-bit width Eq. ( 19).Applying the equation, the first stage accumulation of 80 data requires only 32 addition operations.
Step 4: Following accumulation, an operation instead of the square is performed, which is introduced in the next section.The accumulation of an encoded codeword is completed after 14 cycles.At the 5th clock, calculate the next encoded codeword.In Fig. 2, the buffer indicates a cache of one clock.

The implementation of removing square operation
For Eq. ( 18), a symbol contains 80 bits.Due to the k-constrained algorithm, there will not be five consecutive 1's and five consecutive 0's, and a symbol contains no more than 80*80% 1's.In addition, the sine-cosine values are represented by integers in the range of 0 to 15.In extreme cases, 80*80% 1's are required to multiply with these sine-cosine values.The sine-cosine values involved in the multiplication are considered as the mean value, 7.5, and then the multiplication result is 80*80% *7.5.The result of the current symbol needs to be added to that of the previous symbol, so the accumulated result can be 80*80% *7.5*2 = 960.To simplify calculating the square of large number, the num in Eq. ( 18) is set to 16.Thus, 80*80% *7.5*2/16 = 960/16 = 60.The division by 16 yields the same result as a 4-bit right shift.In order to prevent some accumulated results divided by 16 from exceeding 60, we add an overflow control.If any results are greater than 60, the results are set to 60.
As shown in Fig. 3, we compare the segmented line function f (x) with the square function f (x).It is seen that the two curves exhibit a high degree of concordance, suggesting a strong resemblance between them.Using Eq. ( 23), the correlation coefficient rela between the estimated and actual square values equals 1.
where E[f (x i )] denotes the expected actual value and E f (x i ) denotes the expected estimated value.Then, we define a variable td according to Eq. ( 24), consult the t -distribution table, and obtain a p-value of 0 that is less than the significance level (p = 0.05).As a result, the correlation coefficient rela is regarded as significant.The f (x) and f (x) are completely correlated.
Next, we examine the R 2 relationship between f (x) and f (x) as stated in Eq. ( 25).The calculated value of R 2 is zero, demonstrating that the variance of the difference between f (x) and f (x) is 0% of the variance of f (x).The variance of the difference between f (x) and f (x) is extremely small, indicating that f (x) and f (x) are quite close in value.

Implementation result
We use a Spartan6 XC6SLX25 to implement the FPGA.Table 1 illustrates the resources consumed by spectrum shaping encoder based on MASP-R and MASP-NRS.These two MASPs employ the same decoding technique, and the hardware resources of decoder are detailed in Table 2.We can see that the encoder consumes more resources than the decoder, since the former one implements the MASP-R/MASP-NRS algorithms.The encoder consumes more 1,500 slice registers than the decoder.Also, it consumes twice as many LUT slices as decoder, due to MASP-R/MASP-NRS needs combinatorial logics such as addition.In particular, the encoder with MASP-R employs two DSPs to calculate the remainder and square operation as in Eq. ( 7), whereas the encoder with MASP-NRS  Figure 4 demonstrates the power spectrum densities for the same spectrum code.The dashed curve corresponds to the result of the initial MASP which is depicted in Eq. ( 7), while the solid curve represents the result of the MASP-NRS.Both curves use a code length of 80 bits, and the encoding and decoding methods are similar, except for the difference in the accumulated signal power method and scrambling.In Fig. 4, we can see that the MASP can generate spectrum nulls of −22.8 dB at frequency 1/90 and −20.0 dB at frequency 1/60.The improved algorithm MASP-NRS obtains spectrum nulls of −22.5 dB and −19.4 dB at frequencies 1/90 and 1/60, respectively.The spectrum nulls of MASP-NRS are 98.7% and 97.0% of those of the MASP, with losses of 1.3% and 3% due to truncation operations in MASP-NRS.

DISCUSSION AND CONCLUSION
In this research, we enhance the encoder-decoder algorithms for spectrum shaping codes in order to facilitate hardware implementation.We improve the scrambling algorithm and provide a mathematical description of the k-constrained algorithm.Concerning both descrambling and scrambling, we employ parallel operations that can be executed within a single schedule.We propose an enhanced MASP-R algorithm to compute remainder operations for sine-cosine accumulation; however, its execution in parallel is challenging due to its significant time complexity.Thus, we further present a MASP-NRS algorithm that quantizes sine-cosine values with short bit-width and stores them in ROM, eliminating the remainder operation.In particular, the MASP-NRS allows parallel operations for the sine-cosine accumulation within a single clock.It is capable of resolving the parallelization issue that plagued the initial MASP.Furthermore, we put forward a search algorithm that utilizes two approaches: dynamical error and balanced-coefficient mean square error.The search algorithm generates a curve f (x) similar to the square function f (x).By employing correlation and R 2 analysis, it is possible to ascertain that f (x) and f (x) are almost equivalent.The complexity is reduced by two orders of magnitude through substituting the square operation in MASP with f (x).Finally, the encoder-decoder of shaping codes is executed utilizing the Spartan6 XC6SLX25.The synthesis results show that the decoder is simpler than the encoder since it does not have to calculate the accumulated signal power.Furthermore, we demonstrate that the performance of initial MASP and MASP-NRS is nearly identical, yielding spectrum nulls of approximately −22.8 dB, which confirms the accuracy of the proposed algorithm.

Figure 1
Figure 1 Schematic diagram of the spectrum shaping code with guided scrambling.Full-size DOI: 10.7717/peerjcs.1883/fig-1 {b 0 ,b 1 ,b 2 ,...,b n−2 } is the bit set to be scrambled.The {c 0 ,c 1 ,c 2 ,...,c n−2 } represents the scrambled bit set.The b i and c i are the binary bits, 0 ≤ i ≤ n − 2. Each value of c 0 ,c 1 ,...,c p−1 is initialized to zero.The bits of c p ,c p+1 ,...,c n−2 can be generated by employing the encoding as